Overview
Music is an important component in the entertainment. Some people say that music may tell us what the artist wants to say, but the popular music tells us what the people want to hear. This project aims to use the sentimental analysis on lyrics of popular music and Twitter posts related to the songs to reveal the secret of the successful songs/lyrics and the impact of the emotions in songs to the audience.
Subquestions:
1. What is the most frequent word used in the Populor Songs?
2. What Have Changed to Popular Songs?
3. Who is the most popular singers in last ten years in Billboard?
4. Is there a pattern for those successful songs?
Data Collection and Data Cleaning
Music Data
This project collected the information for music and Twitter posts from different APIs.
The billboard.py, a Python API for accessing music charts from Billboard.com, is used to collect the tiltles and artists’ names. With the music information, the PyLyrics, a python module to get lyrics of songs from lyrics.wikia.com, helped to find those lyrics.
The biggest problem for the data cleaning in this part is those special signs in the titles or the singer lists:
Graph 1(Tableau)
We Can see that there are very few songs using languages other than English. The songs with different languages are deleted so that the sentimental analysis wil be more accurate.
The comparision between data before and after data cleaning

Graph 4(Tableau)
1. What are the most frequent words used in Populor Songs?
Let’s start to find our answer by showing some EDA plot. In the first sub-question, our team want to simply study what is the word used most frequently in the lyrics. We first use the word cloud plot for all the songs in Billboard Top 100 to study it.
Graph 8(matplotlib, Word Cloud), Source Code
The plot shows that there are still a lot of words related to love and romantics. Besides, there are less dirty words in the higher ranking songs, the reason might be that these songs are accepted by more people and people are less acceptable with lyrics full of offensive words.
2. What Have Changed to Popular Songs?
Now, our team want to study the trend for the popular music development. The plots will show how the names of songs changed, what kind of songs will stay in the Billboard longer
Source: BillBoard Top 100 Weekly List, 1990 - 2010
Data Grabbed: Song Name, Artist, Number of Weeks Stayed on Billboard, Year, Peak Position, and Lyrics
1. How did the Song Name Change from 1990 to 2017?
- Calculated the Average Length of All Song Names and Visualized by Year
2. Is There Any Difference Between Top 50 Songs and Bottom 50 Songs?
- This Time I calculated Average Length Again, but split by Top 50 and Bottom 50
- Bottom 50 have more variations
3. What Songs Are More Likely to Stay Popular for Longer?
- Measured by Number of Weeks Stayed on Billboard
4. What Lyrics Made Songs Popular from 2008 - 2018?
- Ranked artists by number of songs that were ever on BillBoard
How are these common words used by the Top 20 Artists?
3. Who is the most popular singers in last ten years in Billboard?
Does those singers with more songs count have higher rank in the board?
Graph 16(Tableau)
According to the plot above, the answer is no. Not let us consider what is the relationship between the number of singers’ songs with the number of weeks on the Billboard Top100.

Graph 17(Tableau)
It seems there is a logarithm relationship between the count and the mean total weeks of songs on the Billboard.
4.What is the characters for those popular songs’ lyrics?
Talyor Swift Example
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
## Loading required package: RColorBrewer
##
## Attaching package: 'tidyr'
## The following object is masked from 'package:magrittr':
##
## extract
##
## Attaching package: 'igraph'
## The following object is masked from 'package:tidyr':
##
## crossing
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following object is masked from 'package:plotly':
##
## groups
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
## ========================================
## circlize version 0.4.4
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: http://jokergoo.github.io/circlize_book/book/
##
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
## in R. Bioinformatics 2014.
## ========================================
##
## Attaching package: 'circlize'
## The following object is masked from 'package:igraph':
##
## degree
##
## Attaching package: 'reshape2'
## The following object is masked from 'package:tidyr':
##
## smiths
## # A tibble: 10 x 2
## track_title length
## <fct> <int>
## 1 Sad Beautiful Tragic 183
## 2 A Perfectly Good Heart 224
## 3 The Outside 227
## 4 State of Grace 231
## 5 A Place In This World 232
## 6 Breathe (Ft.聽Colbie聽Caillat) 234
## 7 Cold as You 242
## 8 Tied Together With A Smile 245
## 9 Invisible 248
## 10 Come Back... Be Here 267

Graph 18(ggplot), Source Code
The average word count for the tracks stands close to 375, and chart shows that maximum number of songs fall in between 345 to 400 words. The density plot shows that the distribution is close to a normal distribution.

Graph 19(ggplot), Source Code
Basically, the most frequent mood in the songs is positve. And we can see that Talyor Swift have expressed all kinds emotions in her songs. Joy, anticipation and trust emerge as the top 3.

Graph 22(ggplot), Source Code
We can see that joy has maximum share for the years 2010 and 2014. Overall, surprise, disgust and anger are the emotions with least score; however, in comparison to other years 2017 has maximum contribution for disgust. Coming to anticipation, 2010 and 2012 have higher contribution in comparison to other years.